YouTube API Design Evaluation and Latency Budget
Learn how our API meets non-functional requirements and what is the estimated response time for streaming a video.
Introduction#
In the previous lesson, we learned how a streaming service's functional requirements are met. In this lesson, we’ll focus on some interesting aspects of the non-functional requirements of the design. We’ll try to answer some of the common questions that might have come to your mind regarding API performance.
Non-functional requirements#
The subsequent sections discuss how the non-functional requirements are met.
Scalability#
YouTube-like systems need to scale in both aspects, that is, horizontally and vertically. We provide loosely coupled services by executing independent tasks statelessly and in parallel. Since it is not possible to serve numerous users requesting large-sized videos simultaneously, YouTube uses Internet exchange points to populate CDNs and the Google global cache (GGC) to serve end users effectively to scale its services.
Availability #
In cases of load spikes, such as epidemic events, unexpected viral videos, or DDoS attacks, we fan out client requests by adding a queuing system, allowing servers to respond when they have free capacity, rather than processing them directly. This may add some response delay, but the system will remain available under these circumstances. We also offload some processing to the client machine, such as managing playtime and other events, by sending only the most necessary events to the server.
Additionally, routing popular content through Content Delivery Networks (CDNs) allows us to reduce latency, avoid single points of failure, and increase fault tolerance. Below, we provide the CDN workflow for a client requesting a video trailer. We show how YouTube is able to redirect users to the CDN that serves images under the domain of www.ytimg.com (YouTube images):
1 of 11
2 of 11
3 of 11
4 of 11
5 of 11
6 of 11
7 of 11
8 of 11
9 of 11
10 of 11
11 of 11
Flexibility/adaptability #
Our API supports a wide variety of consumers such as TVs, mobile phones, desktops, etc., and we make sure our services are flexible enough to work across these different devices by transcoding audio/video chunks into the appropriate formats. We also support adaptive bitrate and buffering based on device capabilities to improve user playback experience. Additionally, user playback history is synced to keep logged-in users consistent across devices.
Security #
We assume that the majority of requests sent to YouTube are from unauthenticated users, and we identify these types of requests by using the API key and the session created when the user plays their first video. YouTube may also have private and public content. To support access to private content, we have a login mechanism for user authorization and authentication. Also, for third-party clients (sites, applications, etc., that embed YouTube players), we allow authorization by using OAuth and OpenID Connect code authentication with the PKCE flow.
Note: Our discussion on the optimization of the file upload API service has some commonalities with the YouTube API service because they deal with uploading (large) files. To avoid repetition, we recommend going through that content as well.
Low latency#
For a streaming service like YouTube, we should expect more read requests than write requests, which means more users watch videos than upload them to their channel. Note that we use a write-expensive approach where the data is processed while the video is being uploaded. Hence, users can directly retrieve the preprocessed data, and reading data is much faster than writing data. We further improve performance by reducing latency using the following techniques:
Caching: The
ETagsare used to identify a specific version of a response object. When we transfer data in segments (streams), we can consider each segment as a different version of the same object, which can be cached usingETagvalues (also see the caching at different layers lesson).Adaptive bitrate: This allows clients to intelligently request segments of the specific quality (144–4k) they need, as per the available bandwidth and network congestion.
Prefetching: This helps achieve a smooth user experience and adds a cushion for buffering content and withstanding delays.
Compression: Clients can compress the response to save some bandwidth by adding
Accept-Encoding: gzip, deflate, brto the request.
Achieving Non-Functional Requirements
Non-Functional Requirements | Approaches |
Scalability |
|
Availability |
|
Flexibility/adaptability |
|
Security |
|
Low latency |
|
Latency budget#
This section calculates the response time for fetching audio and video clips from the YouTube API. We can calculate the response time as follows:
Calculate the message size for the request and response.
Calculate the response time based on the estimated message size.
As discussed in the Back-of-the-envelope Calculations for Latency chapter, in the case of
GET, the averageremains the same regardless of the data size (due to the small request size), and the time to download the response varies by per KB.
The manifest file#
The YouTube client requests the manifest file through GET before streaming content. Let's assume the request and response message sizes for the manifest file.
Request size: The
GETrequest is assumed to be 1 KB since it will only contain a few parameters like video ID, user credentials (if required), client type, and so on.Response size: We assume that the response size is roughly 35 KBs based on the data inside the manifest file. Usually, it includes adaptation sets, representations, durations, segment information, and so on, for the player to play the content.
Point to Ponder
Question
Can we also perform segmentation on the manifest file?
Yes, it is also possible (although, not as often); to split the manifest file into multiple parts, we may have to add a small repeating piece of information in every chunk to identify the original video. These chunks can then be pushed together by the server, or clients can request them individually on demand.
Response time#
Considering the response size is 35 KBs, the following calculation will estimate the response time for obtaining a manifest file.
Response time calculator for the manifest file
| Enter size in KBs | 35 | KB |
| Minimum latency | f204.5 | ms |
| Maximum latency | f285.5 | ms |
| Minimum response time | f208.5 | ms |
| Maximum response time | f289.5 | ms |
Assuming the response size is 35 KBs, the latency is calculated by:
Similarly, the response time is calculated using the following equation:
Now, for the minimum response time, we use minimum values of base time and processing time:
Now, for the maximum response time, we use the maximum values of base time and processing time:
Audio and video segments#
Audio and video segments are also retrieved using the GET method, and the player will receive multiple clips simultaneously due to the HTTP multiplexing feature. Let's calculate the average response time for one segment under the following heading.
Request and response size#
Let's assume that, on average, we receive a video segment of 1560.78 KB and an audio segment of 432.76 KB. We can use the request sizes from the previous examples to calculate the response time for the manifest file, since all of these are standard GET requests and their sizes don't vary much.
Response time#
Let's take a video segment as an example, since it is more extensive and affects the overall user-perceived latency. The following calculator gives the minimum and maximum responses for a video clip.
Response time calculator to obtain the video segment
| Enter size in KBs | 1560.78 | KB |
| Minimum latency | f814.812 | ms |
| Maximum latency | f895.812 | ms |
| Minimum response time | f818.812 | ms |
| Maximum response time | f964.812 | ms |
A summary of the overall latency budget for streaming video segments using our YouTube API is shown in the illustration below:
Optimization and tradeoffs#
Real-time transmissions, such as live broadcasts and short clips, where users can quickly swap videos, can be tricky to manage because they have a very low tolerance for latency. Here, we can achieve low latency using the following techniques:
Compromising video quality: We can send the lower resolution first, and, when we have buffered some playback time, we can send the high-resolution segments.
Reducing the segment length: We can send high-resolution fragments by reducing the segment length to receive small-sized, high-quality segments in real time. However, reducing the segment length can cause performance degradation in the compression algorithm. This is because large-sized segments can have a higher degree of redundancy and, therefore, result in better compression.
Prefetching the next segment: Instead of waiting for the current video to finish playing, we can prefetch the next video to be played in advance. However, this may not reduce the latency of the first video, but the latency of subsequent videos can be reduced.
We can implement all of the above techniques by taking a hybrid approach and making tradeoffs between video quality, fragment size, and buffer size to prefetch segments before playback.
Points to Ponder
Question 3
How can we keep the buffering time of the initial segments to a minimum?
We can reduce buffering time by:
-
Pushing the initial audio or video segments along with the manifest file, without waiting for the player to request again.
-
Sending segments of low-resolution initially, and then improving the quality as per the client bandwidth.
-
Prefetching the initial segments by intelligently guessing the next video to be played.
Note: However, these tricks sometimes backfire, but most of the time, they work just fine. For example, if we send the initial segments in a low resolution, but the user sets the video quality to the highest available resolution, then the prefetched data would be wasted.
3 of 3
In this lesson, we discussed how our API meets the non-functional requirements described in the first lesson of this chapter. We also estimated the average response time for our API performance. Lastly, we went through some scenarios where our API could be adapted to handle near-real-time events. In the next lesson, we’ll exercise what we learned via a quiz on TikTok (a messaging platform that supports video streaming).
API Model for YouTube Service
Quiz on TikTok API Design